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Lung cancer in never-smokers ranks as the seventh most common cause of cancer death 
worldwide, and the incidence of lung cancer in non-smoking Korean women appears to be 
steadily increasing. To identify the effect of genetic polymorphisms on lung cancer risk in 
non-smoking Korean women, we conducted a genome-wide association study of Korean 
female non-smokers with lung cancer. We analyzed 440,794 genotype data of 285 cases 
and 1 ,455 controls, and nineteen SNPs were associated with lung cancer development 
(P< 0.001). For external validation, nineteen SNPs were replicated in another sample set 
composed of 293 cases and 495 controls, and only rs101 87911 on 2p1 6.3 was significantly 
associated with lung cancer development (dominant model, OR of TG or GG, 1 .58, P = 
0.025). We confirmed this SNP again in another replication set composed of 546 cases and 
744 controls (recessive model, OR of GG, 1 .32, P = 0.027). OR and P value in combined 
set were 1 .37 and < 0.001 in additive model, 1 .51 and < 0.001 in dominant model, and 
1.54 and < 0.001 in recessive model. The effect of this SNP was found to be consistent 
only in adenocarcinoma patients (1.36 and < 0.001 in additive model, 1.49 and < 0.001 
in dominant model, and 1 .54 and < 0.001 in recessive model). Furthermore, after 
imputation with HapMap data, we found regional significance near rs101 87911 , and five 
SNPs showed Pvalue less than that of rs101 87911 (rs12478012, rs4377361, rs13005521, 
rs1 2475464, and rs7564130). Therefore, we concluded that a region on chromosome 2 is 
significantly associated with lung cancer risk in Korean non-smoking women. 
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INTRODUCTION 

Lung cancer is the leading cause of cancer death worldwide 
with over 1 million deaths each year. Although cigarette smok- 
ing is the major cause of this malignancy, global statistics esti- 
mate that 15% of lung cancers in men and 53% in women are 
not directly attributable to smoking (1). The incidence of lung 
cancer in Korean women appears to be steadily increasing, as 
in Asian women generally, and the majority of these women 
are non-smokers. Moreover, lung cancer in Korean women ap- 
pears to be predominantly adenocarcinoma. Gender, clinico- 
pathologjcal, and molecular differences in lung cancers arising 
in never-smokers and smokers indicate that lung cancer in 
non-smoking women might be a different disease (2, 3). Recent 
evidence for genetic influence on smoking behavior and nico- 
tine dependence has prompted a search for susceptibility genes 
related with lung cancer development (4). However, no regions 
were replicated except for 5pl5.33 as susceptibility loci for never- 
smoking lung cancer patients although three human genome 
regions at chromosome 5pl5, 15q25, and 6p21 were found to 
be associated with susceptibility to lung cancer through several 
genome-wide association (GWA) studies (4-14). Furthermore, 
there was no information for Korean non-smoking women. For 
these reasons, we conducted a genome-wide association study 
(GWAS) of female non-smoking Koreans with lung cancer to 
identify low-penetrance alleles influencing lung cancer risk in 
Korean non-smoking women. 

MATERIALS AND METHODS 
Study populations 

A total of 286 non-smoking women with newly diagnosed lung 
cancer at Kyungpook National University Hospital (30 cases), 
Korea University Anam Hospital (39 cases), Seoul National Uni- 
versity Hospital (140 cases), and Seoul National University Bun- 
dang Hospital (77 cases) in Korea between 2001 and 2008 were 
recruited for GWAS. There were no age, histologic, or stage re- 
strictions, and all lung cancer cases were histologically confirm- 
ed. The control subjects for GWAS were 1,462 non-cancer par- 
ticipants admitted to the Korean Association REsource (KARE) 
of Korea Centers for Disease Control and Prevention (KCDC). 
They were matched to the cases by sex and smoking status (case: 
control= 1:5). For the replication studies, we recruited new po- 
pulation, 293 cases and 495 controls for replication 1 (70 cases 
and 100 controls in Kyungpook National University Hospital; 64 
cases in Korea University Anam Hospital; 115 cases in Seoul 
National University Hospital; and 44 cases and 395 controls in 
Seoul National University Bundang Hospital) and 546 cases 
and 744 controls for replication 2 (84 cases and 43 controls in 
Kyungpook National University Hospital; 44 cases in Korea Uni- 
versity Anam Hospital; 90 cases in Seoul National University 
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Hospital; 57 cases and 232 controls in Seoul National University 
Bundang Hospital; and 271 cases and 469 controls in Korea Na- 
tional Cancer Center). All participants were never-smoking wo- 
men and information on age, sex, smoking history, and disease 
status for them was collected by trained interviewers working 
in each center. 

GWA genotyping 

DNA was extracted from whole blood sample using a QIAamp® 
DNA Blood Midi Kit (Qiagen, Valencia, CA, USA). SNP geno- 
typing was performed using Affymetrix Genome-Wide Human 
SNP array 5.0 comprising 440,794 genome-wide SNPs (Affyme- 
trix, Santa Clara, CA, USA) according to the manufacturer's pro- 
tocol. Total genomic DNA (500 ng) were digested with Nsp I 
and Sty I restriction enzymes and ligated to specific adaptors 
which incorporate a universal PCR priming sequence. PCR 
amplification was performed with universal primers in a reac- 
tion optimized to amplify fragments between 200-1,100 base 
pairs. A fragmentation step then reduced the PCR product to 
segments of approximately 25-50 bp, which were then end-la- 
beled using biotinylated nucleotides. The labeled product was 
hybridized to a chip, washed and detected. The images were 
analyzed using GCOS software (Affymetrix, Santa Clara, CA, 
USA). For quality control (QC), two algorithms were used: the 
QC call rates for each array exceeded 86% using the Dynamic 
Model algorithm, and genotype calls for each site were more 
than 98% using the birdseed v2 genotyping call algorithm. 

GWA data mining 

We genotyped 440,794 SNPs in 286 cases using Affymetrix Gene- 
Chip 5.0 and obtained chip data of 1,462 controls from KARE of 
KCDC. First, we checked relationships among subjects using 
graphical representation of relationship errors (GRR) software 
and excluded 8 subjects (7 controls from KARE and 1 case from 
Seoul National University Bundang Hospital) with mean of 
identical-by-state (IBS) allele sharing over a number of poly- 
morphic loci for each pair of individuals more than 1.5 and fi- 
nally analyzed 285 cases and 1,455 controls using PUNK vl.07 
program. Cutoff P values of Hardy- Weinberg Equilibrium (HWE), 
call rate, and minor allele frequency were 0.001, 0.95, and 0.01, 
respectively, and 331,088 SNPs were used in the analysis. Be- 
cause of our small sample size, we used a 10-fold cross-valida- 
tion method (90% randomly repeated selection of total popula- 
tion for a validation, 10 trials) to protect against false positives 
in logistic regression analysis. In all analyses, we adjusted for 
age in the model. Finally we selected 19 SNPs under P value 
0.0001 for replication in another sample sets after considering 
each genotype scatter plot. All nineteen SNPs were consistent 
in all ten times cross-validation under P value 0.001 and con- 
firmed to be called accurately. A quantile-quantile plot using P 
values obtained from additive model in logistic regression anal- 
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yses revealed no difference of the observed P values and those 
expected by chance (Fig. 1). A small inflation factor (X) of 1.019 
(or 1.000, based on the bottom 90% of quantile-quantile plot) 
indicated that the association we found is unlikely to have re- 
sulted from population stratification. Population structure was 
evaluated by PUNK vl.07, and the quantile-quantile plot was 
generated using R 2.13.0. We used the MACH 1.0.16 software to 
impute untyped SNPs using the linkage disequilibrium infor- 
mation from the HapMap phase II database (CHB+IPT was 
used as reference set, released 17 July 2006). 

Replication geno typing 

From the GWA analysis, we selected 19 SNPs (rsl0187911, rs- 
13005521, rsl484322, rs755927, rs2191876, rs2378688, rs303451, 
rs41393946, rs7928853, rs7305016, rs8041151, rsl7258206, rsl7- 
258247, rs9939608, rs7241996, rsl0485620, rs2884026, rs925338, 
and rs8068946) associated with lung cancer development un- 
der cutoff P value 0.000 1 and genotyped those SNPs in samples 
obtained from new populations (replication set 1 and replica- 
tion set 2). 




3 4 5 
Expected (-log/ 5 ) 

Fig. 1 . Quantile-quantile plot of P value. 

Table 1 . Primers and PCR conditions for amplification and RFLP 



Replication 1. The 19 SNPs were evaluated in replication study. 
Of the 19 SNPs, 14 variants (rsl0187911, rsl3005521, rs2884026, 
rsl484322, rs755927, rs2191876, rs2378688, rs303451, rs41393946, 
rs7928853, rs7305016, rs8041151, rsl7258206, rsl7258247, rs- 
925338, rs9939608, rs8068946, rs7241996, and rsl0485620) were 
selected by MassARRAY AssayDesign software package (v3.1) 
and genotyped using SEQUENOM's MassArray iPLEX technol- 
ogy (SEQUENOM I') with negative controls following the man- 
ufacturer's instructions. Genotype calls were made using the 
default post-processing calling parameters in SEQUENOM's 
Typer 4.0 software. The genotypes of five SNPs (rsl3005521, rs- 
2191876, rs2884026, rs8068946, and rs925338) were performed 
using a polymerase chain reaction (PCR) and restriction frag- 
ment length polymorphism (RFLP) (Table 1) with negative 
controls. For quality control, the genotyping analysis was per- 
formed blind with regards to the subjects. In addition, approxi- 
mately 5% of the samples were randomly selected to be geno- 
typed again by SEQUENOM's MassArray iPLEX technology and 
PCR-RFLP, and the results showed 100% concordance. 

Replication 2. Because only rsl018791 1 of replicated 19 SNPs 
showed statistical significance in replication set 1, only rslO- 
187911 was genotyped using the TaqMan fluorogenic 5' nucle- 
ase assay (AB I') in replication set 2. The final volume of poly- 
merase chain reaction (PCR) was 5 uL, containing 10 ng of ge- 
nomic DNA and 2.5 uL TaqMan Universal PCR Master Mix, 
with 0.13 uL of 40X Assay Mix (Assay ID, C_30098973_10). Ther- 
mal cycle conditions were as follows: 50°C for 2 min to activate 
the uracil N-glycosylase and to prevent carry-over contamina- 
tion, 95°C for 10 min to activate the DNA polymerase, followed 
by 45 cycles of 95°C for 15 sec and 60°C for 1 min. All PCR were 
performed using 384-well plates by a Dual 384-Well GeneAmp 
PCR System 9700 (AB I') and the endpoint fluorescent readings 
were performed on an ABI PRISM 7900 HT Sequence Detection 
System (AB I'). Duplicate samples and negative controls were 
included to ensure accuracy of genotyping. 

Statistical analyses for replication study 

Unconditional logistic regression was used to estimate odds ra- 
tios (OR) and 95% confidence interval (CI) after adjustment for 



rsN 


Direction 


Sequence 


Annealing temperature (°C) 


Restriction enzyme 


rs1 3005521 


Forward 
Reverse 


5 '-GGATTTTCATATAGMGAGAAGGAG-3 ' 
5 '-CTGTGGGAATGTAGGTTAGTTC-3 ' 


58 


MalV 


rs2191876 


Forward 
Reverse 


5 '-CGCAAGAATCTTAAAACCTCCTCTG-3 ' 
5 '-GAGTGTCATCAGAAAGATACATCAGTC-3 ' 


60 


HPyCH4\\\ 


rs2884026 


Forward 
Reverse 


5 '-TGCAGATATTAATCTGGTTATC-3 ' 
5 '-AATTGTTCATAACTCGATAGTC-3 ' 


52.8 


Hpy188\\\ 


rs8068946 


Forward 
Reverse 


5 '-TGCAGATATTAATCTGGTTATC-3 ' 
5 '-AATTGTTCATAACTCGATAGTC-3 ' 


56 


Hpy188\\\ 


rs925338 


Forward 
Reverse 


5 '-CTTTCATCCTCTTCTCTAGTGTG-3 ' 
5'-CTTCCAGGACAGCTTCTGC-3' 


60 


Msp\ 



PCR, polymerase chain reaction; RFLP, restriction fragment length polymorphism. 
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age in lung cancer patients and control subjects. To find out 
whether each SNP site are on the HWE, the distributions of ob- 
served genotype frequency and expected genotype frequency 
calculated from observed allele frequency were compared us- 
ing f test. SAS, version 9.1 (SAS institute Inc., Cary, NC, USA), 
was used for statistical analysis. 

Ethics statement 

The study protocol was approved by the institutional review 
board at Seoul National University Hospital (IRB No. C-0903- 
037-274), and written informed consent was provided by all 
study participants. 

RESULTS 

We analyzed 285 cases and 1,455 controls using Affymetrix Gene- 
Chip 5.0. To protect against false positive, we used 0.001, 0.95, 
and 0.01 as cutoff P values of HWE, call rate, and minor allele 
frequency, respectively, as well as 10-fold cross-validation me- 
thod, and then considered each genotype scatter plot. Finally 
we selected 19 SNPs under P value 0.0001 for replication in an- 
other sample sets. Table 2 shows summary of 19 SNPs signifi- 
cantly associated with lung cancer development in GWAS. For 
validation, the nineteen SNPs were replicated in another sam- 
ple set, replication 1, which was composed of 293 cases and 495 
controls, and only one SNP (rsl0187911 on pl6.3 of chromo- 
some 2) was significantly associated with lung cancer develop- 
ment in dominant model (OR of TG or GG, 1.58, P = 0.025) (Ta- 



ble 3). We reconfirmed this SNP in another sample set, replica- 
tion 2, composed of 546 cases and 744 controls (total 1,290) (re- 
cessive model, OR of GG, 1.32, P = 0.027) (Table 4). OR and P 
value in combined set (total set) were 1.37 and < 0.001 for ad- 
ditive model, 1.51 and < 0.001 for dominant model, and 1.54 
and < 0.001 for recessive model (Table 4). We also assessed the 
effect of the SNP by center (Fig. 2). All five centers showed con- 
sistent results both in total population and in adenocarcinoma 

CaSeS (additive model, ORotal, 1.37 VS ORadenocarcinoma, 1.36). 

Because we confirmed the effect of rsl0187911 on lung can- 
cer development in all sets of GWAS, replication 1, replication 2, 
and total population, we tried to show regional plot near rslO- 
187911 with genotypes estimated using HapMap data to incre- 
ase the spectrum of variants tested for associations in this re- 
gion (Fig. 3). After imputation, we found regional significance 
near rsl0187911, and all rsl0187911 and six SNPs existing in 
the same intron of NRXN1 gene with rsl0187911 showed P val- 
ue < 0.001. Among these six SNPs existing in the same intron of 
NRXN1 gene with rsl0187911, five SNPs (rsl2478012, rs4377361, 
rsl3005521, rsl2475464, and rs7564130) showed P value less 
than that of rsl0187911 and one SNP (rsl0048731) showed P 
value more than that of rsl0187911. 

DISCUSSION 

We found a new SNP (rsl0187911) at 2pl6.3 as a susceptibility 
marker to lung cancer in never-smoking Korean females. Since 
most cases in our population were adenocarcinoma patients 



Table 2. Summary of nineteen SNPs significantly associated with lung cancer development in GWAS 
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some cyto- 
band 




Base pair 
position in 
chromosome 






Additive model 




Dominant model 




Recessive model 


rsN 


Gene 


Ref. 

allele 


OR 


95% CI 
(Lower- 
Upper) 


P value 


OR 


95% CI 
(Lower- 
Upper) 


P value 


OR 


95% CI 
(Lower- 
Upper) 


P value 


rs1 01 87911 


2p16.3 


NRXN1 


50647947 


G 


1.47 


1.22-1.78 


< 0.001 


1.71 


1.25-2.34 


< 0.001 


1.66 


1.21-2.26 


0.002 


rs1 3005521 


2p16.3 


NRXN1 


50595856 


C 


1.52 


1.25-1.84 


< 0.001 


1.75 


1.25-2.46 


0.001 


1.73 


1.28-2.33 


< 0.001 


rs2884026 


2q32.3 


TMEFF2 


192930360 


C 


1.38 


1.13-1.69 


0.002 


1.84 


1.36-2.48 


< 0.001 


1.12 


0.76-1.65 


0.576 


rs1 484322 


4p14 


UBE2K, C4orf34 


39671142 


C 


0.67 


0.55-0.82 


< 0.001 


0.75 


0.57-1.00 


0.048 


0.34 


0.21-0.54 


< 0.001 


rs755927 


4p16.2 


CYTL1, MSX1 


4943456 


A 


0.53 


0.39-0.72 


< 0.001 


0.51 


0.36-0.70 


< 0.001 


0.41 


0.12-1.40 


0.154 


rs2191876 


7p14.3 


PDE1C 


31882928 


A 


2.09 


1.45-3.00 


< 0.001 


2.12 


1.46-3.09 


< 0.001 


3.60 


0.35-37.63 


0.284 


rs2378688 


9q21.33 


AGTPBP1, NTRK2 


88118037 


C 


0.66 


0.54-0.81 


< 0.001 


0.66 


0.50-0.87 


0.003 


0.44 


0.28-0.68 


< 0.001 


rs303451 


10p11.23 


MAP3K8, LYZL2 


30777280 


A 


2.67 


1.68-4.22 


< 0.001 


2.67 


1.68-4.22 


< 0.001 








rs41 393946 


11 p1 4.1 


KCNA4, METT5D1 


29017564 


C 


1.57 


1.27-1.95 


< 0.001 


1.83 


1.39-2.40 


< 0.001 


1.53 


0.92-2.56 


0.105 


rs7928853 


1 1p1 4.1 


KCNA4, METT5D1 


29029556 


C 


1.53 


1.24-1.89 


< 0.001 


1.78 


1.35-2.34 


< 0.001 


1.46 


0.87-2.44 


0.149 


rs7305016 


12q13.13 


SMAGP 


51642543 


C 


0.65 


0.53-0.81 


< 0.001 


0.64 


0.49-0.84 


0.001 


0.44 


0.27-0.72 


0.001 


rs8041 1 51 


15q26.2 


MCTP2, RGMA 


94626455 


A 


1.51 


1.24-1.83 


< 0.001 


1.79 


1.31-2.46 


< 0.001 


1.67 


1.21-2.30 


0.002 


rs1 7258206 


16q21 


L0C644649, G0T2 


59381595 


A 


0.71 


0.58-0.87 


< 0.001 


0.56 


0.42-0.74 


< 0.001 


0.86 


0.58-1.28 


0.463 


rs1 7258247 


16q21 


L0C644649, G0T2 


59394794 


A 


0.70 


0.57-0.86 


< 0.001 


0.56 


0.43-0.74 


< 0.001 


0.83 


0.56-1.24 


0.368 


rs925338 


16q21 


L0C644649, G0T2 


59349532 


A 


0.71 


0.58-0.87 


0.001 


0.56 


0.43-0.74 


< 0.001 


0.88 


0.59-1.30 


0.513 


rs9939608 


16q21 


L0C644649, G0T2 


59392020 


C 


0.70 


0.57-0.85 


< 0.001 


0.56 


0.42-0.74 


< 0.001 


0.81 


0.54-1.22 


0.319 


rs8068946 


17q24.3 


SLC39A11 


70712513 


A 


0.73 


0.59-0.89 


0.002 


0.57 


0.43-0.75 


< 0.001 


0.93 


0.63-1.38 


0.721 


rs7241 996 


18q12.2 


L0C647946, MIR4318 


35490215 


A 


0.53 


0.40-0.70 


< 0.001 


0.52 


0.38-0.71 


< 0.001 


0.20 


0.05-0.87 


0.032 


rs1 0485620 


20q13.2 


ZFP64, SALL4 


50565168 


C 


2.00 


1 .53-2.61 


< 0.001 


2.07 


1.54-2.79 


< 0.001 


3.76 


1.56-9.08 


0.003 



OR, odds ratio; CI, confidence interval; GWAS, genome-wide association study. 
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Table 3. Summary of nineteen SNPs in another sample set, replication 1 



rsN 


Chromosome 


Gene 


Base pair position 


Ret. 


Additive model 


Dominant model 


Recessive model 


cytoband 


in chromosome 


allele 


OR 


P value 


OR 


P value 


OR 


P value 


rsl 0187911 


2p1 6.3 


NHXNl 


J - AO A~7C\ A~7 

50647947 


G 


1.23 


0.067 


1.58 


0.025 


1.16 


0.392 


rsl 3005521 


2p1 6.3 


NHXNl 


rnrnrnrr 1 

5059585b 


C 


1.08 


0.477 


1.16 


0.416 


1.07 


0.724 


rs288402o 


2q32.3 


IMttrz 


1 929303b0 


C 


1.05 


0.667 


1.05 


0.818 


1.08 


0.660 


rsl 484322 


4p14 


UBtzK, C4ort34 


OAO~7H H A O 

39b71 142 


c 


0.93 


0.544 


0.88 


0.549 


0.93 


0.681 


rs/5592/ 


4p1 6.2 


n\fTI i AACV1 

LYILl, MVS/ 


a n a o a no 

494345b 


A 


0.99 


0.967 


1.24 


0.644 


0.96 


0.821 


„nj m o~70 

rs219187b 


7p1 4.3 


PDE1C 


O H OOOAOO 

31882928 


A 


1.06 


0.798 


1.13 


0.591 


- 


- 


rs2378b88 


9q21 .33 


AblPBPl, NIHKz 


88118037 


C 


0.93 


0.529 


0.96 


0.817 


0.82 


0.381 


rs303451 


10p11.23 


/i it a no i/ri i \/~7i o 

MAP3K8, LYZLz 


oo~7~7~7ono 

30777280 


A 


0.72 


0.332 


- 


- 


0.72 


0.332 


rs41 89894b 


11p14.1 


KLNA4, Mtl I5U1 


2901 75b4 


C 


0.98 


0.875 


1.29 


0.487 


0.93 


0.619 


TS7928853 


1 1 p1 4.1 


KCNA4, Mtl I5U1 


OAAOAOOO 

2902955b 


C 


1.00 


0.971 


1.40 


0.338 


0.92 


0.621 


-~~700ITOH O 

rs730501b 


1 2q1 3.13 


on a a pn 

bMAhr 


51 b4Z543 


C 


1.18 


0.129 


1.26 


0.144 


1.24 


0.322 


rsR041 1 51 


1 5q26.2 


MCTP9 RHMA 


a'H-UilU'H-'J'J 


y\ 


0.92 


0.452 


0.87 


0.417 


0.93 


0.710 


rs1 725820b 


16q21 


L0C644649, G0T2 


59381595 


A 


0.89 


0.338 


0.90 


0.662 


0.85 


0.306 


rsl 7258247 


16q21 


L0C644649, G0T2 


59394794 


A 


0.87 


0.245 


0.84 


0.457 


0.84 


0.278 


rs925338 


16q21 


L0C644649, G0T2 


59349532 


A 


0.92 


0.450 


0.90 


0.640 


0.89 


0.471 


rs9939b08 


16q21 


L0C644649, G0T2 


59392020 


C 


1.07 


0.562 


1.14 


0.437 


1.01 


0.983 


rs80b894b 


17q24.3 


SLC39A11 


70712513 


A 


0.94 


0.601 


0.92 


0.584 


0.94 


0.780 


rs724199b 


18q12.2 


L0C647946, MIR4318 


35490215 


A 


1.12 


0.415 


1.73 


0.189 


1.07 


0.689 


rs10485b20 


20q13.2 


ZFP64, SALL4 


50565168 


C 


0.93 


0.669 


0.89 


0.538 


2.01 


0.419 



OR, odds ratio. 



Table 4. Summary of SNP (rsl 01 87911) replicated in other validation sets, replication 1 and replication 2 





Geno- 
type 


No. (%) of 
cases 


No. (%) of 


No. (%) of 
controls 






All cases 






Adenocarcinoma only 


Set 


adenocarci- 
noma 


Model 


OR 


95% CI 
(Lower-Upper) 


P value 


OR 


95% CI 
(Lower-Upper) 


P value 


GWAS 


GG 


80 (28.2) 


67 (27.1) 


283 (19.5) 


Additive 


1.47 


1.22-1.78 


< 0.001 


1.40 


1.15-1.71 


0.001 


(285 cases and 1 ,455 controls) 


GT 


138(48.6) 


119(48.2) 


699 (48.0) 


Dominant 


1.71 


1.25-2.34 


< 0.001 


1.59 


1.14-2.20 


0.006 




TT 


66 (23.2) 


61 (24.7) 


473 (32.5) 


Recessive 


1.66 


1.21-2.26 


0.002 


1.56 


1.12-2.18 


0.008 


Replication 1 


GG 


93 (33.3) 


87 (33.0) 


150 (30.7) 


Additive 


1.23 


0.99-1.53 


0.067 


1.24 


0.99-1.55 


0.060 


(293 cases and 495 controls) 


GT 


140 (50.2) 


135 (51.1) 


223 (45.7) 


Dominant 


1.58 


1.06-2.35 


0.025 


1.65 


1.10-2.48 


0.016 




TT 


46 (16.5) 


42 (15.9) 


115(23.6) 


Recessive 


1.16 


0.83-1.61 


0.392 


1.15 


0.82-1.61 


0.428 


Replication 2 


GG 


180 (33.6) 


172 (34.3) 


207 (28.2) 


Additive 


1.15 


0.98-1.34 


0.082 


1.16 


0.99-1.36 


0.064 


(546 cases and 744 controls) 


GT 


240 (44.9) 


222 (44.2) 


360 (49.0) 


Dominant 


1.08 


0.83-1.42 


0.573 


1.08 


0.82-1.42 


0.578 




TT 


115(21.5) 


108 (21.5) 


167 (22.8) 


Recessive 


1.32 


1.03-1.67 


0.027 


1.35 


1.06-1.72 


0.017 


Total 


GG 


353(32.1) 


326 (32.2) 


640 (23.9) 


Additive 


1.37 


1.23-1.51 


< 0.001 


1.36 


1.23-1.52 


< 0.001 


(1 ,1 24 cases and 2,694 controls) 


GT 


518(47.2) 


476 (47.0) 


1,282(47.9) 


Dominant 


1.51 


1.26-1.79 


< 0.001 


1.49 


1.25-1.79 


< 0.001 




TT 


227 (20.7) 


211 (20.8) 


755 (28.2) 


Recessive 


1.54 


1.31-1.81 


< 0.001 


1.54 


1.31-1.82 


< 0.001 



OR, odds ratio; CI, confidence interval; GWAS, genome-wide association study. 



(92.4%), this SNP was significantly associated only with adeno- 
carcinoma. 

This SNP (rsl884396) is located on intron oiNRXNl gene, 
one of the neurexins that function as cell adhesion molecules 
during synaptogenesis and take part in intracellular signaling 
(15). NRXN1 is extraordinarily large, occupying 1,112,039 bp of 
DNA - nearly 0.04% of the entire human genome. Neurexin 
transcripts undergo alternative splicing at several sites (15). 
Each site of alternative splicing is defined as a position in the 
mRNA where sequence variations have been observed. There- 
fore, the SNP (rsl884396) on intron of NRXN1 gene may be re- 
sponsible for regulation of splicing events, leading to protein 
diversity. In fact, NRXN1 is related with hemangiomas as well 
as autism spectrum disorders, mental retardation, language 



development delays, hypotonia, and the VACTERL association 
(16-19). Moreover, Nussbaum et al. suggested involvement of 
NRXN1 in nicotine dependence (20), associated with the 15q25 
region in a previous lung cancer GWA study. Mutation of epi- 
dermal growth factor receptor (EGFR) is known to be impor- 
tant in chemotherapy of Asian lung cancer patients. Interest- 
ingly, NRXN1 encodes alpha type of isoform using upstream 
promoter and this isoform has an epidermal growth factor 
(EGF)-like sequence (16) which can bind to EGFR. Further- 
more, lung injury by several chemicals may in part be inferred 
via curated interactions between NRXN1 and chemicals such 
as tetrachlorodibenzodioxin (TCDD), ozone (O3), and chlorine 
(CI) (Comparative Toxicogenomics Database, htrp://ctd.mdibl. 
org/detail.go?type = gene&acc = 9378; http://ctd.mdibl.org/de- 
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(160,1.65,1.31-2.08) 


Korea University 




(147,1.30,1.03-1.66) 


Korea University 




(137,1.32,1.03-1.69) 


Seoul National University Hospital 




(345,1.32,1.12-1.56) 


Seoul National University Hospital 




(318,1.34,1.13-1.59) 


Korea National Cancer Center 




(271,1.24,1.04-1.48) 


Korea National Cancer Center 




(271,1.24,1.04-1.48) 
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Fig. 2. Summary of odds ratio (OR) and confidence interval (CI) by center. The number of cases, OR, and 95% CI were depicted in order in brackets. 
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Fig. 3. Regional plot surrounding rs1 01 8791 1 . 

tail.go?type = gene&acc = 9378&view = ixn). 

In this study, we imputed other SNPs not explored in our 
GWAS on lung cancer development using HapMap data and 
found significant associations with other SNPs surrounding 
rsl0187911 (existing in the same intron) in NRXN1, indicating 
that the rsl0187911 we found is a true indicator related with 
lung cancer development. However, we need to confirm which 
one among these significant SNPs is a real marker of lung can- 
cer development in the future. 

In this study, we did not confirm the previously reported SNPs 



from GWA studies in Asian population and found inconsistent 
results among several reports (21-25). A study of lung cancer in 
never-smoking Asian women (21) demonstrated that common 
genetic variants (rs2736100) in the TERT-CLPTM1L locus on 
chromosome 5pl5.33 are associated with risk for lung adeno- 
carcinoma. Li et al. (22) demonstrated novel genetic variants at 
13q31.3 as susceptibility marker to never-smoker lung cancer 
in mosdy American subjects. However, this study did not repli- 
cate significant association of 5pl5.33 with lung cancers in nev- 
er-smokers. Ahn et al. (23) suggested the 18pll.22 region as 
novel lung cancer susceptibility locus in never-smokers in Ko- 
rean population. This study observed a P value of 0.008 for rs- 
2736100 with no genome-wide significance level. Unfortunate- 
ly, we could not evaluate the significance of rs2736100 because 
the locus was not included in 497,345 markers of our GWA study. 
We tried to estimate the effect of SNPs (rs2352028 at 13q31.3 
and rsll080466 and rsll663246 at 18pll.22) reported from the 
Li et al.'s and Ahn et al.'s studies. However, rs2352028 and rsll- 
663246 did not reach significance in our data (P = 0.838 for rs- 
2352028 and P = 0.421 for rsl 1663246), and rsl 1080466 was not 
included in our markers. Another study of lung cancer in Han 
Chinese (24) suggested three new lung cancer susceptibility 
SNPs, rs753955 at 13ql2.12 and rsl7728461 and rs36600 at 22q- 
12.2. However, rs753955 and rsl7728461 did not reach signifi- 
cance in our data {P = 0.382 for rs753955 and P = 0.537 for rs- 
17728461), and rs36600 was not included in our markers. A re- 
cent study of lung cancer in never-smoking women in Asia (25) 
indicated that rs7086803 of the VTI1A gene on chromosome 
10q25.2 is associated with lung carcinogenesis. The inconsis- 
tent observations in several GWA studies might be attributed to 
different environmental exposures, ethnic differences (21, 22, 
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24), mixed histology (23, 24), sex (22-24), and smoking history 
(23, 24), or different SNP markers (21, 22, 25). 

The incidence of lung cancer in non-smoking Asian women 
appears to be increasing and is predominantiy adenocarcino- 
ma. Based on clinico-pathological and molecular characteris- 
tics of lung cancer arising in never-smoking females, adenocar- 
cinoma has been considered a different disease from other lung 
cancer types. In the present study, all of our subjects were Kore- 
an females and most of the subjects had adenocarcinoma. There- 
fore, rsl018791 1 located on intron of NRXN1 gene at 2pl6.3 fo- 
und in this study may be an important marker for adenocarci- 
noma development in never-smoking females. 

There were some limitations to this study. First, the sample 
size in the present study was small. However, to protect against 
false positives resulting from small sample size, we used a 10- 
fold cross-validation method and confirmed that rsl0187911 
was replicated in all 10 trials with randomly repeated selection 
of total population. Furthermore, we replicated the same analy- 
ses in other sample sets, replication 1 and 2, and obtained con- 
sistent results for rsl0187911. Second, we did not adjust for cen- 
ter and histology regardless of differences in histology by center. 
However, when we stratified the total population in the present 
study by center only for adenocarcinoma, we did not find any 
differences by center in the association between rsl0187911 
and lung cancer risk. Moreover, we found a small inflation fac- 
tor indicating that the association we found is unlikely to result 
from difference by center. 

Although the association of NRXN1 and lung cancer has not 
been explored in epidemiologic study, previously reported data 
(15-20) support a role for this gene in the tumorigenesis of lung 
cancer. Nonetheless, the exact import of NRXN1 in lung cancer 
development remains largely unknown, and further biological 
and epidemiologic studies are needed. 

In conclusion, we have identified a novel genetic locus at 
2pl6.3 region associated with susceptibility of adenocarcinoma 
in Korean never-smoking females. Further replication studies in 
larger populations are necessary to clarify our hypothesis. 
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